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Abstract 

Identification of communities in complex networks is an important topic and issue in many fields such as sociology, biology, 
and computer science. Communities are often defined as groups of related nodes or links that correspond to functional 
subunits in the corresponding complex systems. While most conventional approaches have focused on discovering 
communities of nodes, some recent studies start partitioning links to find overlapping communities straightforwardly. In 
this paper, we propose a new quantity function for link community identification in complex networks. Based on this 
quantity function we formulate the link community partition problem into an integer programming model which allows us 
to partition a complex network into overlapping communities. We further propose a genetic algorithm for link community 
detection which can partition a network into overlapping communities without knowing the number of communities. We 
test our model and algorithm on both artificial networks and real-world networks. The results demonstrate that the model 
and algorithm are efficient in detecting overlapping community structure in complex networks. 
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Introduction 

In the past, it has been shown that many interesting systems can 
be represented as networks composed of nodes and links, such as 
the Internet, social and friendship networks, food webs, and 
citation networks [1—3]. An important topic of current interest in 
the area of networks has been the idea of communities and their 
detection. Detecting communities from a network is a universal 
problem in many disciplines from sociology, computer science to 
biology [4—6]. 

Typically there are two kinds of communities which are node 
communities and link communities respectively. A node commu- 
nity is a dense subgraph induced by a set of nodes, where nodes 
are densely connected within the subgraph, but sparsely connected 
with nodes outside of the subgraph. Most existing methods for 
community detection find a partition of network nodes, i.e. node 
communities. In this type of partition, each node is in one and only 
one community. A link community is a dense subgraph induced by 
a set of links where there are many links within the subgraph, but 
few links connecting the subgraph with the rest of the network. 
Detecting link communities in a partitioning way means to find a 
partition of network links. In this type of partition, each link is in 
one and only one community, but a node can belong to multiple 
communities, depending on the community membership of the 
links incident on it. 

Community detection has many important applications in 
different fields. For example, in biology community detection has 
been applied to find protein functional modules [7] and predict 
protein functions [8]. In sociology, community structure is an 



important topological feature in considering vaccination interven- 
tions of infectious diseases in contact networks [9] and under- 
standing viral propagation in social networks [10]. 

While most previous studies for community detection have 
focused on node communities, some recent works have started 
exploring link communities and cliques [11-15]. In some real- 
world networks, link communities could be more intuitive and 
informative than node communities, because a link is more likely 
to have a unique identity while a node often belong to multiple 
groups [16-21]. For example, most individuals in the society have 
multiple identities such as families, friends, and co-workers, 
whereas the link between two individuals usually exists for a 
dominant reason [1 1] . From the practical point of view, we can 
naturally detect the overlapping node communities by partitioning 
the links into communities [13,16,22-25], because the links 
connected to a node could belong to different link communities 
and consequently the node could be assigned to multiple 
communities of links. 

In a recent study [11], the authors define the link density of a 
link community and the partition density to evaluate the quality of 
a link community partition. Given a network with M links and N 
nodes, P={Pi, • ■ ■ ,Pc} is a partition of the links into C subsets. 
The number of links in subset P s is m s = \P s \. The number of 
induced nodes is «.i = |Uc, 7 eP s .{ v n v /"}l- The link density D s of 
community P s is defined by 
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D, 



m s — (n s — 1) 
n s {n s -Y)/2-{n s -\y 



The partition density D is defined as the average of D s , i.e., 



D= m s -(n s -\) 
~ M^ ms {n s -T)(n s -\y 



We can see that the maximum value of D is 1 but it can take 
values less than 0. D = 1 when each community is a clique and 
D = 0 when each community is a tree. When a network is a tree, it 
cannot be partitioned into proper communities by maximizing D, 
because there are many different optimal partitions, and each 
partition has the same partition density D = 0. For example, the 
network in Figure 1 consists of two communities with one 
overlapping node, and each community is a star graph. If we want 
to partition the network into two communities by maximizing D, it 
is difficult to find the correct result shown in Figure 1A, because 
the partitions in Figure IB and Figure 1C also have D = 0. 

In most studies on link community partition, each link belongs 
to one and only one community. But in real-world networks, a link 
may represent more than one relation between two nodes. For 
example, two individuals from the same family are also co-workers 
in the same institute. Consequently two communities may have 
overlapping links as well. There are few results about how to 
partition a network into link communities with overlapping links. 
In this paper, we redefine the partition density of link commu- 
nities, and formulate the link community partition problem into 
integer programming models. Then we design a genetic algorithm 
for solving the link community detection problem and conduct 
validations on some artificial and real-world networks. 

Methods 

Link Community Partition Density 

Given a network with M links and N nodes, P = {Pi, ■ ■ ■ ,Pc} 
is a partition of the links into C subsets. The number of links in 
community P s is m s = \P s \. The number of induced nodes from 
community P s is n s = \{J eije p 1 {vi,Vj}\. The new link density H s of 
community P s is defined as follows: 



n s (n s - l)/2' 



The new partition density H is the average of H s : 



We can see that the maximum value of H is 1 and the minimum 
value of// is 0. H = 1 when each community is a clique and H = 0 
when each community is an empty graph. Given the number of 
communities, we can find the optimal link community partition by 
maximizing the value of H. For the network in Figure 1, the 
partition in Figure 1A has the maximum value of H, so we can 
easily find the optimal partition by maximizing H. 



Integer Programming Model for Link Community 
Partition 

Given a network G = (V,E) with M links and N nodes, we 
assume that the number of link communities is K and find the 
optimal link community partition by maximizing the partition 
density H. This problem can be formulated into an integer 
programming model. 

Let V = {v\,V2, ■ ■ ■ ,Vjv} be the node set of G, and 
E = {e\,B2, ■ ■ ■ ,bm} be the edge set of G. We define 
R = i r ij)NxM t0 be the incidence matrix of network G, where 
rg = 1 if link ej is incident to node V;, and r,y = 0 otherwise. We also 
define binary variables Xp and yi s to represent the membership of 
link ej and node v, for link community P s : 

1, if ejeP s , 

0, otherwise. 

1, ifv,eP s , 
0, otherwise. 



The link community partition problem can be formulated into 
the following integer programming model-Model- 1 . 



max 



h=It k 



^' =i (£iw-£r=iJ* 



(i) 



s.r.< 



'Ef=i^=l/'=1,2,---,M 

i r ijXjs <M y is i= 1,2, • • ■ ,N;s= 1,2, 

y is < J2ji i r ijXj S i = 1 ,2, • • • ,N; s = 1,2, • • 
^e{0,l};7=l,2,---,M;i=l,2,---,A- 
{ yis e{0,l};i=l,2,---,N;s=l,2,---,K 



■K 



(2) 
■K (3) 

(4) 
(5) 
(6) 



The objective function (1) is to maximize the new link partition 
density H. Constraint (2) means that every link belongs to one 
community. Constraint (3) indicates that if there is one or more 
links in community P s that are incident to node v,, then node v, 
must belong to community P s . Constraint (4) denotes that if node 
V; belongs to community P s , then there is at least one link incident 
to node v,- that belongs to community P s . 

Since the constraint formulae are simple, we can solve the 
integer programming model by Lingo software for small networks 
to see if the model can find overlapping communities properly. 
Using the quantity function and the integer programming model, 
we are able to partition several networks into link communities, 
and obtain correct results. For example, for the network in 
Figure 2A, we can partition it into five overlapping communities 
{1,2, 3, 4, 5}, {7, 8, 9, 10, 11}, {12, 13, 14, 15}, {16, 17, 18}, {1, 
7, 12, 16}, and each community is a clique. Nodes 1, 7, 12, 16 are 
overlapping nodes. The partition density of this link community 
partition is the optimal objective function value 1. We can 
partition the network in Figure 2B into two communities with each 
being a clique. Node 1 and node 2 belong to the two communities 
and link (1, 2) belongs to the bigger community. The objective 
function value is less than 1 due to the unique community 
membership of link (1, 2). 
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B 





Figure 1 . Three different partition results of a tree network. (A) Correct partition. (B,C) Two counter-intuitive partitions. The red links and their 
adjacent nodes constitute a community, the blue links and their adjacent nodes form another community. The black node is overlapped. 
doi:10.1371/journal.pone.0083739.g001 



In Model- 1, since every link can belong to one and only one 
community, we might obtain the result that a pair of nodes belongs 
to the same two communities, but the link between them belong to 
only one of the communities. For example, in Figure 2B, link (1, 2) 
only belongs to the bigger community. In fact, node 1 and node 2 
may have two different relations. For example, they can be 
classmates and sisters at the same time. So the link (1,2) should 
belong to both classmate community and family community. To 
address this drawback, we can revise Model- 1 and obtain the 
following model-Model-2. 



max H - 



2^.5=1 2.//= 



(7) 



fEf=i^>l7 = l,2,---,M (8) 

Ej^ i rijX js <M y is i= 1,2, • ■ ■ ,N; s= 1,2, ■ ■ ■ K (9) 

< y is <j:f =l r<jX JS i=l,2, - ■ ■ ,N;s=\,2, - ■ ■ K (10) 

x ]s e{0,l}y= 1,2, • • • ,M; s= 1,2, • • • ,K (11) 

, y b e {0,1}; (= 1,2, • ■ ■ ,N;s= 1,2, ■ ■ • ,K (12) 



In Model-2, the constraint (8) means that every link must belong 
to at least one community. The link belonging to more than one 
community is regarded as several links in the objective function (7). 
Using Model-2, we can partition the network in Figure 2B into the 
two communities, and link (1,2) belongs to the two communities as 



PLOS ONE | www.plosone.org 



3 



December 2013 | Volume 8 | Issue 12 | e83739 



Discovering Link Communities in Complex Networks 






Figure 2. Link communities of three artifical networks. (A) The network consists of five overlapping communities. Nodes 1, 7, 12, 16 are 

overlapping nodes; (B) The network consists of two overlapping communities. Nodes 1 and 2 are overlapping nodes that belong to the two 
communities, and link (1, 2) belongs to the two communities as well; (C) The network consists of two overlapping cliques and the overlapped 
subgraph is a 3-clique. 
doi:1 0.1 371 /journal.pone.0083739.g002 



well. Each community is a clique, and the optimal objective 
function value that the partition corresponds is 1. Figure 2C is a 
network consisting of two cliques, which are overlapped with a 3- 
clique. This network can be partitioned into two communities, and 
each community is a clique. Two overlapping cliques are correctly 
identified as each link in the overlapping part (3-clique) belongs to 
the two communities at the same time. The optimal objective 
function value of the link partition is 1. Figure 3 is an example 
from reference [11]. In this network, the basketball team 
community consists of two part members: one part members are 
from junior community, and the other part members are from 
senior community. In other words, the basketball team group is 
completely subsumed in two other groups. Using Model-2, we can 
partition the network into three overlapping communities and 
correctly identify the multiple relationships in the basketball team 
community. 

Model-2 can be used to partition sparse networks (e.g., tree-like 
networks) or even disconnected networks. It is easily to prove that, 
when a network is disconnected, it can be partitioned into several 
connected communities. The objective function value is between 0 
and 1. Before using Model-2 to partition a network, the number of 
communities should be given. If the number of communities is 
unknown, we can use Model- 1 to determine it. We can find the 
maximum partition density for every given number of communi- 
ties, then compare all the partition densities and find the 
maximum one. The number of communities with the maximum 
partition density is the final number of communities. 

Genetic Algorithm for Link Community Detection 

Although we can solve Model-2 by Lingo software to partition 
small-scale networks into link communities, we cannot solve the 
integer programming model for large-scale networks which is an 
NP-hard problem. In addition, most of the algorithms for 
community detection need some priori knowledge about the 
community structure like the number of communities which is 
impossible to know in real-life networks. 

In the following, we will design a genetic algorithm for link 
community detection. Genetic algorithm (GA) was proposed in 
[26]. It is a global optimization method in artificial intelligence. 



When the solution space of a problem is too large to allow 
exhaustive searching for exact optimal solutions, genetic algorithm 
can fast converge the problem to a relative smaller solution space, 
and produces approximately optimal solutions. In [27-29], the 
authors designed genetic algorithms for solving the node 
community detection problem in unipartite networks or bipartite 
networks. In this paper, we propose a link community detection 
algorithm based on the hybrid ideas of genetic algorithm and self- 
organizing mapping (SOM) algorithm, which aims to find the best 
link community structure by maximizing the link partition density. 
The algorithm does not need any priori knowledge about the 
number of communities, which makes the algorithm useful in real- 
world networks. The algorithm outputs the final link community 
structure and its corresponding overlapping nodes as the result and 
does not impose further processing on the output. 

The GA main functions. First of all, we need to design a 
chromosome representation encoding the solution for the link 
community detection problem. In our implementation, the 
chromosome is represented by a matrix B = (bj c ), where 
j = 1,2, • • • ,M, and C= 1,2, • • • ,K. Each element bj c is the 
strength with which a network link ey belongs to a community 
P c . Note that bj x ranges in the interval [0.0, 1.0]. Each link of the 
network is subject to the following constraint: 

X>> = 1- (13) 

Equation (13) is to normalize the membership strengths so that the 
strength sum of a link belonging to all the communities equals 1 . 

For each chromosome, we design a partition matrix D = (dj, c ), 
where /'= 1,2, • • • ,M, and C= 1,2, • • • ,K. Each element dj c is 
either 0 or 1 . When dj c = 1 , the link ej is assigned to community 
P c , otherwise, link ej is not assigned to community P c . Matrix D 
can be calculated from matrix B according to the following 
equation: 
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Figure 3. The network in Ref. [11] can be correctly partitioned into three communities by our model, and the objective function 
value is 1. 

doi:1 0.1 371 /journal.pone.0083739.g003 



d J,c 



1, if bj, c = maxi< s < K bjj, 
0, otherwise. 



(14) 



The network is represented by incidence matrix R, link 
adjacency matrix A and weighted link adjacency matrix Q. The 
link adjacency matrix A can be calculated by the following 
equation: A = R T R. In A, the diagonal elements are 2, and the off- 
diagonal elements take values in {0,1} to represent whether two 
links have a common node or not. Let Z be a diagonal matrix 
whose diagonal elements are the inverse of nodes' degree. A node's 
degree is the number of links incident to it. In other words, 



1 



0 



0 



1 



0 



d(v 2 ) 
0 



d(v 3 ) 



.. 0 \ 
•■ 0 
• ■ 0 

1 

d(v N ) I 



The weighted link adjacency matrix Q is defined as Q = R T ZR, 
which means the probability for a random walker going from one 
link to one of its adjacent links across their common node. This 
can be regarded as the possibility of two adjacent links belonging 
to the same community. 



The GA Main Functions 

• Input 

Input the number of nodes N and the number of links M of the 
network, the maximum number of communities K. Calculate the 
incidence matrix R, the link adjacency matrix A = R T R, and the 
weighted link adjacency matrix Q = R T ZR. Give the number of 
individuals U, the maximum epoch T, mutation probability p, 
and SOM parameters a,f!,9. 

• Output 

Output the link partition matrix D* and its fitness value H* (i.e. 
link partition density value), the node partition matrix F. Partition 
the network into communities according to D* and F. 

• Initialization: t = 0 

Randomly generate an initial population B\(t), Biit), ■ ■ ■ , 
Bu(t), and give an initial values of D* and H* . 

• Step 1. Population Fitness 

For all individuals in the population B\(t), B2(t),- ■ ■ ,Bu(t), 
calculate the partition matrices D\(t), Diii), ■ ■ ■ ,Du(t), and their 
fitness values H x {t), H 2 (t), ■ ■ ■ ,Hu(t)- 

• Step 2. Population Sorting 

Sort B\(t),B2(f), ■ ■ ■ ,Bn(t) according to their fitness values in 
descending order. Suppose the sorted chromosomes are 
B 1 (t),B 2 (t),---,B u (t), where Hi(t)>H 2 {i)> ■ ■ ■ >H v {t). If 
H 1 (t)>H*, then D*=D l (t), H*=H x (t). If t=T, then stop, 
output D* and H* , and calculate the corresponding node partition 
matrix F . Otherwise, go to Step 3. 



Step 3. Population Crossover 
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u 



For ■•■)|_ _ 2"J, kt B^u^ + j (t) and cross over to produce two 

temporary individuals (matrices) Wilt) and Byu^ +i (t). If U is an 
odd number, then let W\j(t) = Bu{t). 



matri- 



• Step 4. Population Mutation 

Randomly select pU temporary individuals (temporary ma 
ces), and do mutation operation on each temporary individual 

• Step 5. Population SOM 
For each temporary individual, do SOM operation 

• Step 6. Population Normalization 



on it. 



For each temporary individual, do normalization on it. Denote 
the normalized individuals by B\(t-\-\),B2{t-\-\), - ■ ■ ,Bu(t+\). 
Let t = t+l, and go to Step 1. 

Partition matrix and fitness evaluation. For each indi- 
vidual Bj, calculate the partition matrix D, according to the 
formula (14). For each community P s , l<s<K, let Z),( :,s) be the 
J-th column of matrix D,. Then E,{s) = R-Di( :,s) is a column 
vector whose elements are non-negative integers. A non-zero 
element in Ei(s) represents that the corresponding node belongs to 
community P s . Let Fj(s) be a 0-1 vector, and fi(j,s) = 1 whenever 
ej(j,s)> \ .j](j,s) = 1 means that node v ; belongs to community P s . 
The fitness of individual B, can be calculated by the following 
equation: 



H,- 



(EjiiA(/») 2 



(Ev=i^)) 



Since there is often one maximum value in each row of matrix B, 
by formula (14), we often partition a link into one and only one 
community. When a link is an overlapping link of two 
communities, it cannot be detected by formula (14) directly. To 
identify the overlapping link correctly, we can replace formula (14) 
by the following formula (15). 



if 



max bi 

l<s<K 



>0.5, 



(15) 



otherwise. 



Using formula (15), an overlapping link can be partitioned into 
more than one communities. 

Population sorting. Sort Bilt),B%li), ■ ■ ■ ,Bu(t) according to 
their fitness values in descending order. Suppose the sorted 
chromosomes are Bi(t),B 2 (t), ■ ■ ■ ,B v (t), where H\{t)>H 2 {t)> 
■■■ >Hu(t). If H l (t)>H*, then D*=D l (t), H*=H\(t). 

i u i 

Population crossover. For i=l,2, ■•■ > Lyi do 

crossover 

operation on Bf(t) and i?^j + /(0 by the following rules: randomly 
select a column 5, revise the 5-th column of _B^j + ,(0 by the 5-th 
column of B((t), and obtain two new temporary individuals Wi(i) 
and Wyu^ +i (t). Let Wi(t) = Bj(t). We revise the 5-th column of 
■B^j+/(0 by adding a fraction of the 5-th column of Dili) (where 
Dili) is the partition matrix corresponding to Bj(t)), that is, 



W L e i+i (t)l: ,c)- 



B L a J+ ,.(:,5)+0.1*A(v) 
5 L L,(:, C ) 



if c = s. 
if c=£s. 



Population mutation. According to the mutation probability 
p, randomly select pU temporary individuals, do mutation 
operation on each selected individual. For each selected temporary 
individual W : (t), randomly select two parameters j,S, 1 <j,s<M. 
There are three mutation rules that can be used in this genetic 
algorithm, i.e. exchange the j-th row and the .s-th row in Wilt), or 
replace the y'-th row by the 5-th row in W,(t), or replace the 
elements of the j-th row with randomly selected numbers in 
[0.0,1.0]. Three rules lead to insignificant difference in this genetic 
algorithm. In the following simulation, we replace the j-th row 
with the 5-th row in Wj(i). The other elements in Wi(t) remain 
unchanged. 

Population SOM. The Self-Organizing Mapping (SOM) 
process analyzes the link community ID variance of each link. If 
the community ID variance of a link is larger than a threshold 
value, then increase the membership strength of this link for 
community P s and that of its all neighbor links belonging to the 
same community. Meanwhile, decrease the membership strengths 
of all non-neighbor links for community P s . If the community ID 
variance of a link is smaller than the threshold value, the 
membership strength of the link and all neighbor links belonging 
to the same community decreases. This process can improve the 
quality of the partition by eliminating wrongly placed links due to 
the behaviors of the algorithm. 

For i=l, ■ ■ ■ ,N, do SOM operations on individual (chromo- 
some) Wi as follows: 

• Calculate its partition matrix D t from the matrix Wj according 
to the formula (14); 

• For /= 1, • • • ,M, do the following operation on link e ; . 

• Find the community ID of link ej which corresponds to the 
maximum element in the y'-th row of D i (the maximum 
element must be 1). Suppose the maximum element in the y'-th 
row of D i is in the 5-th column, which is Dj(j,s). This means 
that link ej belongs to community P s - 

• Calculate the total number TN(ej) of adjacent links of e ; - 
(including edge ej), and the number of adjacent links in TN(e/) 
belonging to community P s (denoted by IN(ej)). TN(ej) is equal 
to the sum of elements in the y'-th row of matrix A, which can be 
expressed by TN(ej) = A(j, :)I, where / = (1,1, • • • ,l) T , and 
IN{ej) can be obtained by the equation IN(ej) = A(j, : ) D j ( :,s). 

• Calculate the community ID variance CViej) of link gy by the 
following equation. 



CVlej)-- 



INlej) 
TN{ej) ■ 



VtCVlej)>6, then 
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W t ( :,s)=W,( :,s) + Q{ :jyx-(I-A( :j))-p, 
otherwise, 

W i (:,s)=W i (:,s)-Q( :j)-p. 

where a and ji are adjustable parameters that decrease with the 
step t (In this paper, we let a = a— — (a — 0.1), 

/? = /?— —(Ji — 0.05)). In the above equations, if an element is 

negative, then we set it to be 0.01. 

Normalization. Since the sum of row elements in temporary 
matrix Wj might not be 1, we should do normalization on each 
row of matrix W,. For i= 1,2, ■ ■ ■ ,U, do normalization on each 
row of temporary matrix Wj through dividing it by the sum of row 
elements. 

Complexity of the genetic algorithm. The running time of 
the genetic algorithm is mainly determined by the running time of 
Step 1 and Step 5. The complexity of Step 1 is at most O(MKN), 
and the complexity of Step 5 is at most O(MKN). So the 
complexity of the genetic algorithm is O(MKNT). 

Results 

In this section, we apply the genetic algorithm to a class of 
artificial networks and several real-world networks, and analyze 
the results in terms of classification accuracy and ability of 
detecting meaningful communities. The algorithm is implemented 
by Matlab version 7.1. 

We first do validations on the networks described in Figure 2. 
By setting the parameters as described in Table 1 , we can find all 
the optimal partitions. Then we conduct validation experiments 
on several types of overlapping networks with special structures 
and several real-world networks. 

Ring Networks Consisting of Cliques 

We test our algorithm on a type of exemplar networks, that is, 
rings of cliques, which is not the same as in [30-32]. This network 
consists of many heterogeneous cliques, connected through single 
nodes (Figure 4A). Each clique C, (z' = 1,2, • • • ,K) is a complete 
graph. The network has a clear link modular structure where each 
community corresponds to a single clique, thus the optimal 
partition density is 1. Using our genetic algorithm, we can easily 
detect the optimal partition and identify the overlapping nodes. 
Figure 4A demonstrates a network consisting of two 4-cliques and 
three 5-cliques. Our method can obtain the optimal partition and 
identify the overlapping nodes correcdy. 

Table 1 . The parameters used in the GA algorithm for solving 
the link community detection problem on networks in 
Figure 2. 



network 


K 


N 


P 




a 




T 


A 


5 


40 


0.3 


0.2 


1.0 


0.2 


2000 


B 


2 


40 


0.3 


0.3 


1.0 


0.2 


600 


C 


2,3,4,5 


40 


0.3 


0.2 


1.4 


0.1 


600 
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We also test our algorithm on an overlapping ring network of 
cliques. The network consists of many heterogeneous cliques, and 
two adjacent cliques are overlapped by several nodes and links 
(these overlapping nodes and links form a small clique) (Figure 4B). 
The overlapping ring of clique network can be partitioned into 
multiple communities by our genetic algorithm, and each 
community is a clique. The overlapping small cliques connecting 
pairs of large cliques can also be correctly identified. 

We further validate our algorithm on a tree network of cliques. 
This network consists of multiple cliques connected by overlapping 
nodes. Many subnetworks of metabolic networks are similar to a 
tree of cliques. The network we test consists of five cliques depicted 
in Figure 4C. Using our genetic algorithm, the network can be 
partitioned into the five cliques, and the fitness (partition density) 
of the partition is 1 . 

Applications on Real-world Networks 

In this subsection, we validate our method on three real-world 
networks. 

The karate club network. The first example we consider is 
the famous karate club network analyzed by Zachary [33]. It has 
also been analyzed by many community detection studies. It 
consists of 34 members of a karate club as nodes and 78 edges 
representing friendship between members of the club which was 
observed over a period of two years. We apply our method to the 
karate club network using the parameters K=~i, N = 600, p = 0.2, 
0 = 0.2, a = 0.6, j8 = 0.2, 7=1000. The result is illustrated in 
Figure 5A. The average link density is 0.3349. The colors of the 
links indicate the link communities detected by our genetic 
algorithm, and the colors of the nodes indicate the node 
communities deduced from link communities. In this karate club 
network, our link communities show that node 1 belongs to three 
communities, and nodes 2 and 3 belong to two communities. The 
overlapping part is a 3-clique which was not identified by previous 
methods. 

Word association network. The word association network 
is picked from the South Florida Free Association norm list 
(htqD://www.usf.edu/FreeAssociation/). In the South Florida Free 
Association norm list, the weight of a directed link from one word 
to another indicates the frequency with which the people in the 
survey associate the end point of the link with its starting point. 
The word "play" association network has been replaced with an 
undirected one and tested in [34-36] . This network has 53 nodes 
representing different words and 197 association edges. Using the 
genetic algorithm with parameters K = 3, U = 40, p = 0.2, 6 = 0.2, 
a = 1.0, /> = 0.2, T= 10000, we can partition this network into 
three overlapping communities with the fitness (objective function) 
value 0.3396. The result is described in Figure 5B. From the 
partition results, we can see that words with frequent associations 
are in the same communities. In this network, the word "play" is 
strongly associated with most words, so it is an overlapping node. 
This result has also been obtained by a graph-theoretical method 
for node community detection [35]. 

The co-appearance network. The co-appearance network 
contains 77 characters in the novel Les Mis^rables by Victor Hugo. 
There are 77 nodes and 254 links in the co-appearance network. 
The nodes represent 77 characters and the links connect any pair 
of characters that appear in the same chapter of the book. This 
network was compiled by Knuth [37] based on the list of 
characters' appearance by scene. In this paper, we use the 
unweighted network. Figure 5C shows the partition obtained by 
our genetic algorithm, which divides the network into seven 
overlapping communities. The resulting partition agrees reason- 
ably well with the social divisions and subplots in the plot-line of 
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Figure 4. Link communities of three networks of heterogeneous cliques. (A) The ring network of heterogeneous cliques. Each community is 
a clique, and two adjacent communities are connected by one node. (B) The ring network of overlapping heterogeneous cliques. Each community is 
a clique, and two adjacent communities are connected by one node or one link. (C) The tree network of heterogeneous cliques. Each community is a 
clique, and two adjacent communities are overlapped by one node [11]. 
doi:1 0.1 371 /journal.pone.0083739.g004 



the novel. In [16], the network is partitioned into five commu- 
nities. 

From the results, we can see that this network contains some 
highly connected nodes, some of which (nodes 11, 16, 23, 29, 41, 
48, 55, 58) are overlapping nodes and can connect to multiple 
communities of the network. These nodes can cause serious 
problems if we want to partition the network by conventional node 
community schemes because they do not fit adequately to any 
community. No matter which community we place a highly 
connected node in, its outside links are more than its inside links. 



In contrast, link community schemes can provide an elegant 
solution to this problem because they allow a node to belong to 
multiple communities. As shown in Figure 5C, our algorithm 
properly places nodes 11, 16, 23, 29, 41, 48, 55, 58 into more than 
one community. These nodes correspond to the major characters 
in the novel. In addition, our algorithm also classifies the major 
characters of the novel into their proper communities. For 
example, node 48 corresponds to Gavroche, who is assigned to 
three communities, corresponding to his family members, friends, 
and the people with battle respectively. 
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Figure 5. Link communities of some real-world networks. (A) The Karate club network; (B) The word association network; (C) The co- 
appearance network. 
doi:1 0.1 371 /journal.pone.0083739.g005 
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Discussion and Conclusion 

Community structure is one of the main characteristics of 
complex networks and detecting community structure is very 
helpful for understanding the functions of these networks. In this 
paper, we investigate the link community detection problem and 
propose a new quantity function for link community detection. We 
formulate the link community identification problem into an 
integer nonlinear programming model based on the proposed 
quantity function. Furthermore, we design a GA algorithm for 
solving the link community detection problem and conduct 
validation experiments on some artificial and real-world networks. 

The extensive computational results demonstrate that our 
model and algorithm can detect overlapping communities 
effectively. It will be promising to apply and test our method onto 
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